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Abstract 

Rare disease registries (RDRs) are an essential tool to improve knowledge and monitor interventions for rare 
diseases. If designed appropriately, patient and disease related information captured within them can become the 
cornerstone for effective diagnosis and new therapies. Surprisingly however, registries possess a diverse range of 
functionality, operate in different, often-times incompatible, software environments and serve various, and 
sometimes incongruous, purposes. Given the ambitious goals of the International Rare Diseases Research 
Consortium (IRDiRC) by 2020 and beyond, RDRs must be designed with the agility to evolve and efficiently 
intemperate in an ever changing rare disease landscape, as well as to cater for rapid changes in Information 
Communication Technologies. In this paper, we contend that RDR requirements will also evolve in response to a 
number of factors such as changing disease definitions and diagnostic criteria, the requirement to integrate 
patient/disease information from advances in either biotechnology and/or phenotypying approaches, as well as the 
need to adapt dynamically to security and privacy concerns. We dispel a number of myths in RDR development, 
outline key criteria for robust and sustainable RDR implementation and introduce the concept of a RDR Checklist to 
guide future RDR development. 
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Background 

It is currently stated that there are over 7,000 rare diseases 
identified and reported which affect approximately 6-8% 
of the global population, although sound data is lacking. 
As such it is a public health issue, which requires an 
organised and systematic public health response, including 
accurate data for surveillance and monitoring, as well as 
for individual care. To obtain more reliable rare disease 
prevalence statistics in each country and to enable appro- 
priate therapeutic translational research, Rare Disease 
Registries (RDR) are central [1-3]. International patient 
RDR are also critical to the pharmaceutical industry and 
there is now a very strong sense of urgency for national 
and regional-based registries to become coordinated in 
order to feed into these international registries, which 
often underpin clinical trials [4]. Furthermore, registries 
will provide information on the natural history of specific 
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disorders and provide gene variation and disease pheno- 
type data that will become increasingly important in 
evaluating new therapies and in determining patient ac- 
cess to what might be expensive treatments that often 
have strict access criteria though Government subsidy 
schemes. Unfortunately, to date, there are relatively few 
established national disease registries [3,5,6]. Recently, the 
groups of EURORDIS-NORD-CORD issued a Joint Declar- 
ation of 10 Key Principles for Rare Disease Patient Regis- 
tries [7] and the European Union Committee of Experts on 
Rare Diseases published recommendations [8]. These prin- 
ciples are an invaluable guide for the creation of rare dis- 
ease patient registries as well as to shape policy. They 
complement the main user's guide in the field of registries 
for evaluating patient outcome [9]. A natural extension is 
to determine the metrics that could be used to measure the 
successful adoption of some of these principles. 

A review of rare disease literature raises some import- 
ant questions about RDR. First, there are semantic is- 
sues. For instance, what, if any, is the difference between 
a patient registry compared to a disease registry? How 
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does this relate to a clinical registry? Is there any differ- 
ence between a disease repository, disease registry, con- 
tact registry and a disease or patient database? What is 
the difference between a research 'cohort' (eg EuroCYST 
[10]) and an audit registry (eg the UK Renal Registry 
[11])? What about their relatedness to national and 
ethnic mutation databases (NEMBDs) or locus-specific 
databases [12-14]? Second, what defines successful and 
sustainable interoperability between registries? Third, 
does the fact that a RDR system is available and permit- 
ted for download satisfy the term open source software 
or should other criteria be considered? Fourth, does the 
choice of software environment in which registries are 
implemented affect the ability of a RDR to be custo- 
mised, extended or modified for evolving requirements? 
Fifth, what levels of security are employed, are they 
appropriate and is it possible to modify permissions and 
access privileges dynamically according to changing 
stakeholder needs? There are clearly some pre-conceived 
notions of disease registry development and in this paper 
we dispel three of these myths. In doing so, we highlight 
what we believe are important criteria that should be 
taken into consideration when developing RDR. We intro- 
duce the concept of a RDR Checklist to guide software de- 
velopment and project management best practices, which 
will allow rare disease stakeholders to better accommodate 
critical design issues that impact decision making. 

Dispelling rare disease registry development myths 
Myth 1: technology is not a stumbling block 

A commonly propagated message is that technical chal- 
lenges are insignificant hurdles in the development of 
RDRs [1,5] and some Information Technology experts 
assure the rare disease community that technology is 
not the stumbling block [3]. We contend that technol- 
ogy choices, software architecture design and software 
development practices, to name a few, have a dramatic 
impact on issues such as software sustainability, legacy 
software support, ease of software modification/ 
enhancements and interoperability. To emphasise the 
magnitude of the stumbling block facing software devel- 
opment in general, a recent European Union study con- 
sidered one in eight information technology projects 
truly successful with the cost of project failure esti- 
mated to be 142 billion€ in 2004 [15]. This report lists a 
number of technical reasons for this failure including: 
inappropriate architecture; insufficient reuse of existing 
technical objects; inappropriate testing tools; inappro- 
priate coding language; inappropriate technical method- 
ologies; lack of formal technical standards; lack of 
technical innovation (obsolescence); misstatement of 
technical risk; poor interface specifications; poor quality 
code; poor systems testing; poor data migration; poor 
systems integration; poor configuration management; 



poor change management procedures; and poor tech- 
nical judgment. We contend that many of these tech- 
nical factors can manifest themselves in vendor lock-in 
[16]. In summary, in RDR development as in all soft- 
ware development, it is important to recognise that 
technology can, and often is, a stumbling block. 

Myth 2: professional software developers are not required 
to develop Disease Registries 

There is a significant difference between developing 
RDR that are grounded in professional software devel- 
opment processes versus under-resourced pilot projects 
that are undertaken to meet discrete internal user re- 
quirements with little, if any, engagement with external 
requirements/stakeholders. Professional software devel- 
opment is a complex undertaking that includes: i) ap- 
propriate software project management; ii) team-based 
software development; iii) well-structured, commented 
code; iv) version control; v) issue tracking; vi) documen- 
tation; and vii) software deployment instructions. In 
order to ensure value is delivered to the client, skilled 
software developers need to collaborate with end-users 
to produce working software which is technically excel- 
lent and builds in flexibility for modification should 
needs change [17]. 

Stakeholders undertaking RDR development should 
consider these issues so they can deliver viable software 
solutions while mitigating technical risk. In addition, it 
is instructive to examine a number of important consi- 
derations, such as whether the RDR should be a desktop 
application or an Internet-based application; developed 
on an open source or proprietary software platform; the 
use of a relational database management system or an 
alternative (eg. unstructured) data storage system; and 
the decision to deploy in a cloud environment or on 
physical ICT infrastructure. Other considerations are to 
ensure systems are capable of extensibility, interoper- 
ability, and security that are supported in a sustainable 
way. Once these decisions are made, a critical question 
becomes whether the chosen professional software de- 
velopment team possess the requisite skills and experi- 
ence to adequately support the decisions made. The 
software development process requires expertise and it 
is costly and time consuming. It is interesting to note 
that in a self reported survey, undertaken by TREAT- 
NMD (http://www.treat-nmd.eu), the costs associated 
with developing national Spinal Muscular Atrophy 
registries in more than 30 countries, using a defined set 
of common data elements, were widely variable with 
some registries being established with <3000€, while 
others had funds in excess of 250,000€. The median 
amount of money invested to set up a registry was 
20,000€ (Blanden, personal communication 2013). 
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Myth 3: open source is easy 

Making software available for download is a relatively 
straightforward process. However, the simple ability to 
download software should not be confused with the 
complete and more complex process of open sourcing 
software. There are many instances of open source soft- 
ware that are difficult to install, come without detailed 
download instructions, release notes, version control, or 
documentation, and either do not work or fail with no 
available ongoing support. The quality of open source 
software relates to process: appropriate levels of do- 
cumentation, strategies to capture community feedback, 
open and transparent installation processes, and the de- 
ployment process detailed. It is important to recognise 
that when a software team decides to open source soft- 
ware, they are not only making available their software 
to the broader community, they are also externalising 
their internal software development processes. This can 
be a paradigm shift in a software team's operations as 
processes that come under scrutiny include deployment, 
testing, issue tracking, and accepting patches from the 
community. Open sourcing software is not as simple as 
uploading source code onto the Internet. 

System overview 

A new approach to the design of disease registries to en- 
sure access, security, privacy and the need for clinical 
sites across a given country has been developed. The 
Rare Disease Registry Framework (RDRF) enables access 
and registry of patients with clinical and genetic data 
often arising from different geographical locations. The 
approach adopted is readily applicable to other rare 
diseases [18]. 

Modular rare disease registry framework 

The RDRF has been designed and implemented so that 
common features can be shared between registries. These 
common features include common data elements within 
what is referred to as base modules. Base modules might 
include: Patient Details, Medical History, Diagnosis Infor- 
mation, Genetic Variation, Working Groups and have 
agreed Common Data Elements (CDEs) providing con- 
formity/interoperability of the data fields across platforms. 

Patient Details and Medical History have been devised 
in consultation with patient advocacy groups. The CDEs 
conform to international standards such as TREAT-NMD. 
Base modules can be extended upon and customised for 
individual registries. For example, diagnostic information 
is tailored to each registry, since the required information 
varies significantly. New modules that are required by a 
specific registry can be contributed back to the base mod- 
ule set for use by other registries as required. As new 
registries are built, with each iteration and improvement, 
modules are able to be seamlessly incorporated within 



existing registries. As an example, a questionnaire module 
was created for the Australian Myotonic Dystrophy Regis- 
try (AMDR) which allows patients to directiy enter infor- 
mation. The information entered is held in a 'quarantined' 
region prior to being validated by a clinician. Once vali- 
dated, this information is then incorporated into the regis- 
try. As this module was created for AMDR, it can now be 
loaded back into the other registries. This module can also 
be customized for web based patient registration and com- 
pletion of self reported symptoms, which can be accepted 
or modified by a clinician at the next patient appointment. 

In a similar way other modules created such as 2 factor 
authentication secure log-on, web-enabled consent and 
phenotyping approaches can be 'plugged-in' as required. 
During the development of the RDRF, via professional 
agile software development processes, a refactoring pro- 
cess has now created a number of common modules that 
are shared between individual registries. Because of this 
flexible modular design and thanks to a collaboration with 
the Universal Mutation Database team, we will now be 
able to add specific genetic modules such as: predictions 
of the pathogenicity of reported exonic [19] or intronic 
[20,21] variations; genotype-phenotype correlations [22]; 
or even methods to facilitate new genotype based thera- 
peutic approaches such as exon-skipping [23,24] . 

The RDRF graphical user interface is also modular so 
it can be easily customised for a given RD. A specific ex- 
ample is in the neuromuscular domain where, even as 
the national NMD registry grew, patient advocates from 
Spinal Muscular Atrophy (SMA) wanted a different user 
interface from Duchenne Muscular Dystrophy (DMD) 
or Myotonic Dystrophy. For SMA, the interface was 
modified to reflect real-time practice, i.e. it was aligned 
with how motor function is clinically captured by pa- 
tients and their clinicians. These nuances were able to 
be accommodated without the need to modify the 
underlying architecture of the RDRF. Both national and 
international RD registries have now been built using 
this framework and they all have been informed by fun- 
damental stakeholders such as patients and clinicians. 

The RDRF has been developed to be able to automat- 
ically de-identify data when exported. The Australian 
DMD registry feeds into the TREAT-NMD international 
registry and additionally, where appropriate, we have 
designed interoperability to connect the Myotonic Dys- 
trophy registry to both the TREAT-NMD core data and 
the Rochester Registries with equal degrees of interoper- 
ability for data exchange. 

(i) Security and multi-level access is a key feature in the RDRF 

The registry framework has two levels of access control, 
allowing fine-grained control of access: Groups (user- 
level) define the permissions granted to each user (func- 
tionality); and (ii) Working Groups restrict the content 
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to which Group members have access. In addition, apart 
from Groups and Working Groups, permissions can also 
be set on an individual user basis. Working Groups 
might represent a clinic, hospital, a region or a state. 
They are private, and data is not shared between work- 
ing groups. User groups such as treating Clinicians or 
Geneticists are added to a particular Working Group to 
allow them to work together. Within the RDRF, the se- 
curity model consists of several layers: SSL based en- 
cryption of all traffic; password access to accounts; and 
logging of successful and failed user logins. In addition, 
the RDRF can also be configured to provide in-built IP 
address whitelisting and blacklisting, and two factor 
authentication. With these various levels of security, a 
significant level of confidence can be provided to 
end-users. 

Interoperability 

A key dimension to consider is the effort required for 
RDR to be interoperable. Tedious manual and repetitive 
data transfer between systems is not scalable. Fortu- 
nately, we can leverage other significant efforts to intro- 
duce the concept of Degrees of Interoperability into 
RDR development discussions. Specifically, NATO have 
developed four levels of interoperability that would be 
appropriate for rare disease research [25]. 

• Degree 1: Unstructured Data Exchange. Involves the 
exchange of human-interpretable unstructured data 



such as the free text found in operational estimates, 
analysis and papers. 

• Degree 2: Structured Data Exchange. Involves the 
exchange of human-interpretable structured data 
intended for manual and/or automated handling, 
but requires manual compilation, receipt and/or 
message dispatch. 

• Degree 3: Seamless Sharing of Data. Involves the 
automated sharing of data amongst systems based 
on a common exchange model. 

• Degree 4: Seamless Sharing of Information. An 
extension of Degree 3 to the universal interpretation 
of information through data processing based on 
co-operating applications. 

Understanding Degrees of Interoperability will enable 
decision makers, funders and research scientists to be- 
come aware of the effort required to sustain interoperabil- 
ity between RDR. For instance, if international registries 
require manual entry of unstructured patient/disease data 
(Degree 1) to interoperate with national registries, ultim- 
ately a decision needs to be made as to the financial viabil- 
ity to support this approach in the longer term, not to 
mention the known high risk of human error in this form 
of data exchange. If it is widely accepted, as it is in other 
fields, that Degrees 3 and 4 are the future directions for 
RDR harmonisation, strategic decisions can start to be 
made on interoperability not just between disease regis- 
tries but also with other systems relevant in translational 
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Figure 1 Registry aggregation. A schematic of how disease/patient registries might be aggregated. 
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rare disease research, such as biobanks and integrative - 
omics analysis [26]. 

The RDRF has been recently customised for the group 
of Demyelinating nerve diseases (that includes both rare 
and common forms of Multiple Sclerosis) and includes 
facilities to upload MRI images and other additional in- 
formation necessary for this group of diseases. The 
modular structure of the RDRF enables any RD registry 
deployed to evolve over time and maintain consistency 
with other registries. Because it is modular, it is relatively 
easily to customise the user interface without any signifi- 
cant software development effort. 

Extensibility 

A number of enhancements of the RDRF are underway. 
First, longitudinal phenotypes are being captured by a 
time-stamping functionality that captures a static record of 
the specific patient record at a given time before fields 
within the patient record are modified. Second, the RDRF 
is being refactored to enable aggregation of existing disease 
registries and enable the RDRF to be used for varied func- 
tionality as required, such as a patient registry, common 
registry, a disease-specific registry or even a clinical regis- 
try. A schematic of this approach is shown in Figure 1, 
which attempts to capture two broad existing registry 
domains, namely patient registries (country-centric) and 
disease/clinical registries (disease-centric). There is a need 
to aggregate disease registries and some complementary 
exemplars for this aggregation, based on disease ontologies, 
are suggested in Figure 1 and include: RASopathies; 
Demyelinating Diseases; Neuromuscular; Paediatric Neph- 
rological disease; familial cancers; and paediatric cancers. 
Similarly, patient registries need to be aggregated from re- 
gional through to national and international levels. The 
centre box of Figure 1 attempts to capture the concept that 
registry frameworks might serve multiple purposes. 

Disease registry requirements change over time 

Registry system requirements evolve over time. For in- 
stance, a patient advocacy group might want to develop 
an initial general patient or contact registry for all dis- 
eases, which may need to morph into a registry for spe- 
cific needs. However, if the software cannot support this, 
then a separate registry/registries will need to be estab- 
lished. Similarly, a given disease registry for a neuro- 
muscular disorder might not be designed with other 
organ-specific clinical fields. If a neuromuscular patient on 
the RDR is diagnosed with a different rare disease (e.g. a 
haematological condition), should this additional clinical 
information be included in the existing neuromuscular 
disease registry, entered in a different registry, or both? 
Unfortunately, there is no systematic process to guide 
these decisions within the international rare disease com- 
munity. The same remark also applies at the genetic level, 



which is frequentiy believed to be the "easy" part of the 
data. However, recent technological advances are leading 
to an evolution from single pathogenic variations associ- 
ated with a patient to an expanded set of exome/genome 
wide genetic variation. 

It is not difficult to anticipate that access restrictions will 
change over time. For any given user of the system, be 
they a Clinician, Geneticist, Patient Advocate, Curator, or 
Allied health worker, decisions will constantly be revisited 
on who gets access and on what grounds, what can be 
accessed and how, and where and when access can be 
gained. Through modular development, new features can 
be added and common modules created, but there is an 
ongoing need to refactor the code [17]. The modularity of 



Table 1 RDR Checklist 

1. Technology choices 

• Web-based or desktop application 



4. System design 

• Customisable for 



• Relational Database or 


o a specific disease(s) 


unstructured data 




• Programming Language 


o patient registry 


• Cloud deployment vs Physical ICT 


o clinical registry 


infrastructure 




• Open source vs Proprietary 


• Modular design 




o new features 




o new data elemets 




o new ontologies 


2. Professional Software 


5. Security 


Development 




• Appropriate software project 


• De-identification process 


management 




• Team-based software development 


• Two factor authentication 


• Well-structured, commented code 


• Multi-level user access 


• Version control 


• Work groups 


• Issue tracking 


• Encryption 


• Documentation 


6. Sustainability 


• Software deployment instructions 


• Ease of exchange 


• Functional and Unit Testing 


• Effort required 


• Team-based development 


• Future proofing 


3. Interoperable 


7. Open source 


• Export/import functionality 


■ Appropriate levels of 




documentation 


• Webservice API 


■ Strategies to capture 




community feedback 


• Data standards 


• Open and transparent 




installation processes 


• Ontology 


• Deployment process detailed 



o Data elements 
o Disease elements 
Rare disease registry development RDR Checklist. 
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the RDRF we have developed caters for ease in modifying 
access privileges and adding/customising modules. 

RDRF checklist 

In accordance with the above, we propose a new Check- 
list for RDR development. These key criteria for consid- 
eration in future RDR development are outlined in 
Table 1. 

Conclusions 

The development of robust sustainable RDR is central to 
achieving the goals set by the International Rare Diseases 
Research Consortium (IRDiRC), which aim to have a diag- 
nostic test for most rare diseases and 200 new therapies 
by 2020. Contemporary thinking is that a disease registry 
is only as good as the quality of the patient and disease in- 
formation contained within it. In this paper, we contend 
that in the longer term, the quality of the system in which 
the data is contained becomes a significant bottleneck. 
There are a plethora of registries that differ in naming 
convention and functionality. Traditionally, not all existing 
registries are developed with interoperability and security 
in mind. Currently, there are significant overheads to 
validate and subsequently synchronise patient data from 
various regional, state-based, national registries into inter- 
national resources as successfully demonstrated by the 
TREAT-NMD network of excellence that allows data col- 
lection from more than 40 countries. Patients provide in- 
formed consent, but unfortunately, there is often times 
incongruence between patient information and what clini- 
cians and researchers require to assist in diagnosis and 
treatment; or the information may be siloed and inaccess- 
ible to appropriate allied health workers. This is not a vi- 
able situation going forward. 

Fortunately, from our experience it is possible to design 
robust and sustainable RDR and to cater for the capture of 
vital information as our understanding of disease pro- 
cesses dramatically improves, through major advances in 
biotechnology and phenotyping. The captured data can 
not only drive research and development, but also im- 
provements in clinical care, policy and population-wide 
outcomes for all people with rare diseases. 

Availability and requirements 

Project name: Rare Disease Registry Framework 
Project home page: https://bitbucket.org/ccgmurdoch/ 
disease_registry 
Operating system(s): Linux 
Programming language: Python 

Other requirements: PostgreSQL, Apache (mod_wsgi) 
License: GNU GPL v3 

Any restrictions to use by non-academics: No 
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